Monolingual and Bilingual Concept Visualization from Corpora

نویسندگان

  • Dominic Widdows
  • Scott Cederberg
چکیده

As well as identifying relevant information, a successful information management system must be able to present its findings in terms which are familiar to the user, which is especially challenging when the incoming information is in a foreign language (Levow et al., 2001). We demonstrate techniques which attempt to address this challenge by placing terms in an abstract ‘information space’ based on their occurrences in text corpora, and then allowing a user to visualize local regions of this information space. Words are plotted in a 2-dimensional picture so that related words are close together and whole classes of similar words occur in recognizable clusters which sometimes clearly signify a particular meaning. As well as giving a clear view of which concepts are related in a particular document collection, this technique also helps a user to interpret unknown words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Metalinguistic Awareness and Bilingual vs. Monolingual EFL Learners: Evidence from a Diagonal Bilingual Context

This paper reports a study of 85 Iranian EFL learners in the English Language Department of Urmia University. It explores the possible differences between performance of 38 Persian monolingual and 47 Turkish-Persian bilingual EFL learners on metalinguistic tasks of ungrammatical structures and translation. The underlying hypothesis is that bilinguals in diagonal bilingual contexts experience a ...

متن کامل

Synonymous Collocation Extraction Using Translation Information

Automatically acquiring synonymous collocation pairs such as and from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. Methods that use monolingual corpora alone or use bilingual corpora alone are apparently inadequate because of low precision or low coverage. I...

متن کامل

Learning Crosslingual Word Embeddings without Bilingual Corpora

Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual data or were unable to handle polysemy. We address these drawbacks in our method which takes advantage of a high coverage dictionary in an EM style training alg...

متن کامل

Towards producing bilingual lexica from monolingual corpora

Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embed...

متن کامل

Learning Bilingual Lexicons from Monolingual Corpora

We present a method for learning bilingual translation lexicons from monolingual corpora. Word types in each language are characterized by purely monolingual features, such as context counts and orthographic substrings. Translations are induced using a generative model based on canonical correlation analysis, which explains the monolingual lexicons in terms of latent matchings. We show that hig...

متن کامل

Building bilingual terminologies from comparable corpora: the TTC TermSuite

In this paper, we exploit domain-specific comparable corpora to build bilingual terminologies. We present the monolingual term extraction and the bilingual alignment that will allow us to identify and translate high specialised terminology. We stress the huge importance of taking into account both simple and complex terms in a multilingual environment. Such linguistic diversity implies to combi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003